Goto

Collaborating Authors

 big model


Author Response for The Unreasonable Effectiveness of Big Models for Semi Supervised Learning

Neural Information Processing Systems

We thank the reviewers for feedback, as well as efforts in reviewing. We respond to each comment below. Overall, there is no significant contribution to unsupervised pre-training. " The fact that our main contribution is a detailed procedure, rather than a theorem, architecture, or other artifact, We believe our contributions are significant. Indeed, R3 recognizes that "the simple semi-supervised framework is still I think it will inspire several future works." " While we believe ImageNet is a much more These results can be further improved with better augmentations during fine-tuning and an extra distillation step.


How IBM CEO Arvind Krishna Is Thinking About AI and Quantum Computing

TIME - Tech

IBM was one of the giants of 20th-century computing. It helped design the modern PC, and created the first AI to defeat a human champion in the game of chess. But when you think of AI, IBM might not be the first, or even the tenth, company to spring to mind. "We are a B2B company, and explaining what we do to the average reader--we'll take all the help we can get," IBM CEO Arvind Krishna joked ahead of a recent interview with TIME. IBM does indeed build AI models--not massive ones like OpenAI's GPT4-o or Google's Gemini, but smaller ones designed for use in high-stakes settings, where accuracy comes at a premium.


Review for NeurIPS paper: Big Self-Supervised Models are Strong Semi-Supervised Learners

Neural Information Processing Systems

Weaknesses: - Most major parts in this work, such as distillation, fine-tuning are proposed in previous works. Although the authors improve SimCLR and proposed SimCLR V2, the novelties in individual parts are somehow limited. However, I think the simple semi-supervised framework is still valuable for industry and future works. It explicitly points out a previously ignored paradigm in semi-supervised visual learning where regularization based methods dominates. I think it will inspire several future works following the paradigm.


Architectural Foundations for the Large Language Model Infrastructures

Zhu, Hongyin

arXiv.org Artificial Intelligence

The development of a large language model (LLM) infrastructure is a pivotal undertaking in artificial intelligence. This paper explores the intricate landscape of LLM infrastructure, software, and data management. By analyzing these core components, we emphasize the pivotal considerations and safeguards crucial for successful LLM development. This work presents a concise synthesis of the challenges and strategies inherent in constructing a robust and effective LLM infrastructure, offering valuable insights for researchers and practitioners alike. In the endeavor of constructing a robust and expansive large language model (LLM) infrastructure [1], the pivotal challenges encountered inherently encompass the trifecta of infrastructure, software, and data.


Challenges and Responses in the Practice of Large Language Models

Zhu, Hongyin

arXiv.org Artificial Intelligence

This paper meticulously curates questions that are both thought-provoking and practically relevant, providing nuanced and insightful answers to each. To facilitate readers' understanding and reference, this paper specifically classifies and organizes these questions systematically and meticulously from the five core dimensions of computing power infrastructure, software architecture, data resources, application scenarios, and brain science. This work aims to provide readers with a comprehensive, in-depth and cutting-edge AI knowledge framework to help people from all walks of life grasp the pulse of AI development, stimulate innovative thinking, and promote industrial progress.


Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones

Motetti, Beatrice Alessandra, Crupi, Luca, Elshaigi, Mustafa Omer Mohammed Elamin, Risso, Matteo, Pagliari, Daniele Jahier, Palossi, Daniele, Burrello, Alessio

arXiv.org Artificial Intelligence

Sub-10cm diameter nano-drones are gaining momentum thanks to their applicability in scenarios prevented to bigger flying drones, such as in narrow environments and close to humans. However, their tiny form factor also brings their major drawback: ultra-constrained memory and processors for the onboard execution of their perception pipelines. Therefore, lightweight deep learning-based approaches are becoming increasingly popular, stressing how computational efficiency and energy-saving are paramount as they can make the difference between a fully working closed-loop system and a failing one. In this work, to maximize the exploitation of the ultra-limited resources aboard nano-drones, we present a novel adaptive deep learning-based mechanism for the efficient execution of a vision-based human pose estimation task. We leverage two State-of-the-Art (SoA) convolutional neural networks (CNNs) with different regression performance vs. computational costs trade-offs. By combining these CNNs with three novel adaptation strategies based on the output's temporal consistency and on auxiliary tasks to swap the CNN being executed proactively, we present six different systems. On a real-world dataset and the actual nano-drone hardware, our best-performing system, compared to executing only the bigger and most accurate SoA model, shows 28% latency reduction while keeping the same mean absolute error (MAE), 3% MAE reduction while being iso-latency, and the absolute peak performance, i.e., 6% better than SoA model.


Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception

Wang, Xiao, Wu, Wentao, Li, Chenglong, Zhao, Zhicheng, Chen, Zhe, Shi, Yukai, Tang, Jin

arXiv.org Artificial Intelligence

Understanding vehicles in images is important for various applications such as intelligent transportation and self-driving system. Existing vehicle-centric works typically pre-train models on large-scale classification datasets and then fine-tune them for specific downstream tasks. However, they neglect the specific characteristics of vehicle perception in different tasks and might thus lead to sub-optimal performance. To address this issue, we propose a novel vehicle-centric pre-training framework called VehicleMAE, which incorporates the structural information including the spatial structure from vehicle profile information and the semantic structure from informative high-level natural language descriptions for effective masked vehicle appearance reconstruction. To be specific, we explicitly extract the sketch lines of vehicles as a form of the spatial structure to guide vehicle reconstruction. The more comprehensive knowledge distilled from the CLIP big model based on the similarity between the paired/unpaired vehicle image-text sample is further taken into consideration to help achieve a better understanding of vehicles. A large-scale dataset is built to pre-train our model, termed Autobot1M, which contains about 1M vehicle images and 12693 text information. Extensive experiments on four vehicle-based downstream tasks fully validated the effectiveness of our VehicleMAE. The source code and pre-trained models will be released at https://github.com/Event-AHU/VehicleMAE.


Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Wang, Xiao, Chen, Guangyao, Qian, Guangwu, Gao, Pengcheng, Wei, Xiao-Yong, Wang, Yaowei, Tian, Yonghong, Gao, Wen

arXiv.org Artificial Intelligence

With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as BERT, ViT, GPT, etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network architectures, and knowledge enhanced pre-training. After that, we introduce the downstream tasks used for the validation of large-scale MM-PTMs, including generative, classification, and regression tasks. We also give visualization and analysis of the model parameters and results on representative downstream tasks. Finally, we point out possible research directions for this topic that may benefit future works. In addition, we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models: https://github.com/wangxiao5791509/MultiModal_BigModels_Survey


Unpacking the Ethical Value Alignment in Big Models

Yi, Xiaoyuan, Yao, Jing, Wang, Xiting, Xie, Xing

arXiv.org Artificial Intelligence

Big models have greatly advanced AI's ability to understand, generate, and manipulate information and content, enabling numerous applications. However, as these models become increasingly integrated into everyday life, their inherent ethical values and potential biases pose unforeseen risks to society. This paper provides an overview of the risks and challenges associated with big models, surveys existing AI ethics guidelines, and examines the ethical implications arising from the limitations of these models. Taking a normative ethics perspective, we propose a reassessment of recent normative guidelines, highlighting the importance of collaborative efforts in academia to establish a unified and universal AI ethics framework. Furthermore, we investigate the moral inclinations of current mainstream LLMs using the Moral Foundation theory, analyze existing alignment algorithms, and outline the unique challenges encountered in aligning ethical values within them. To address these challenges, we introduce a novel conceptual paradigm for aligning the ethical values of big models and discuss promising research directions for alignment criteria, evaluation, and method, representing an initial step towards the interdisciplinary construction of the ethically aligned AI This paper is a modified English version of our Chinese paper https://crad.ict.ac.cn/cn/article/doi/10.7544/issn1000-1239.202330553, intended to help non-Chinese native speakers better understand our work.


From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models

Yao, Jing, Yi, Xiaoyuan, Wang, Xiting, Wang, Jindong, Xie, Xing

arXiv.org Artificial Intelligence

Big models, exemplified by Large Language Models (LLMs), are models typically pre-trained on massive data and comprised of enormous parameters, which not only obtain significantly improved performance across diverse tasks but also present emergent capabilities absent in smaller models. However, the growing intertwining of big models with everyday human lives poses potential risks and might cause serious social harm. Therefore, many efforts have been made to align LLMs with humans to make them better follow user instructions and satisfy human preferences. Nevertheless, `what to align with' has not been fully discussed, and inappropriate alignment goals might even backfire. In this paper, we conduct a comprehensive survey of different alignment goals in existing work and trace their evolution paths to help identify the most essential goal. Particularly, we investigate related works from two perspectives: the definition of alignment goals and alignment evaluation. Our analysis encompasses three distinct levels of alignment goals and reveals a goal transformation from fundamental abilities to value orientation, indicating the potential of intrinsic human values as the alignment goal for enhanced LLMs. Based on such results, we further discuss the challenges of achieving such intrinsic value alignment and provide a collection of available resources for future research on the alignment of big models.